Comparison of statistical methods used to meta-analyse results from interrupted time series studies: an empirical study

Background The Interrupted Time Series (ITS) is a robust design for evaluating public health and policy interventions or exposures when randomisation may be infeasible. Several statistical methods are available for the analysis and meta-analysis of ITS studies. We sought to empirically compare available methods when applied to real-world ITS data. Methods We sourced ITS data from published meta-analyses to create an online data repository. Each dataset was re-analysed using two ITS estimation methods. The level- and slope-change effect estimates (and standard errors) were calculated and combined using fixed-effect and four random-effects meta-analysis methods. We examined differences in meta-analytic level- and slope-change estimates, their 95% confidence intervals, p-values, and estimates of heterogeneity across the statistical methods. Results Of 40 eligible meta-analyses, data from 17 meta-analyses including 282 ITS studies were obtained (predominantly investigating the effects of public health interruptions (88%)) and analysed. We found that on average, the meta-analytic effect estimates, their standard errors and between-study variances were not sensitive to meta-analysis method choice, irrespective of the ITS analysis method. However, across ITS analysis methods, for any given meta-analysis, there could be small to moderate differences in meta-analytic effect estimates, and important differences in the meta-analytic standard errors. Furthermore, the confidence interval widths and p-values for the meta-analytic effect estimates varied depending on the choice of confidence interval method and ITS analysis method. Conclusions Our empirical study showed that meta-analysis effect estimates, their standard errors, confidence interval widths and p-values can be affected by statistical method choice. These differences may importantly impact interpretations and conclusions of a meta-analysis and suggest that the statistical methods are not interchangeable in practice. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-024-02147-z.


Introduction
Systematic reviews may be used to collate and synthesise evidence on the effects of interventions targeted at populations (e.g., effects of a country-wide ban on smoking rates [1]) or the impacts of exposures (e.g., impacts of flooding events [2]).These reviews may include evidence beyond randomised trials by necessity, because trials may not be possible (in the case of exposures) or feasible (in the case of interventions targeted at populations) [3].The interrupted time series (ITS) may be considered for inclusion in such reviews because this design is often used to examine population-level interventions and exposures, when randomisation is not possible (e.g., for ethical reasons, when a policy targets an entire population).Furthermore, this design is considered a robust alternative for evaluating the impact of population-level interventions / exposures [4][5][6][7].The results across the included ITS studies may be statistically combined using meta-analysis, providing a combined estimate of the intervention / exposure's impact [8,9].
In a classical ITS study, data are collected over time both before and after an intervention or exposure (henceforth referred to as an 'interruption'), and aggregated using summary statistics over regular time intervals [10].For example, in Ejlerskov et al. [11], the interruptions examined were policies implemented in six supermarkets that aimed to reduce the purchasing of less-healthy foods that are commonly displayed at supermarket checkouts.The outcome examined was the number of checkout food purchases, aggregated into four-weekly periods (Fig. 1, Additional file 1: Figure S1) [11].While the ITS design may also be used to examine the effects of an intervention on individuals (in which multiple measurements are taken before and after the intervention for each individual), we do not consider the use of the ITS design in this context further [12,13].
In the analysis of data from this classical ITS design, a commonly fitted model structure is the segmented linear model [14,15].This model allows estimation of separate trends before and after the interruption (referred to as the pre-and post-interruption trends).Hence the advantage of the ITS design is that the series acts as its own control; the pre-interruption trend can be projected into the post-interruption period, which, when modelled correctly, provides a counterfactual for what would have occurred in the absence of the interruption [5,14,15].The impact of the interruption can then be estimated by comparing the counterfactual with the observed post-interruption trend.A variety of effect metrics can be calculated, including levelchange (e.g., immediately following the interruption) and slope-change [7,16].
When estimating the regression parameters of a segmented linear model, characteristics of time series data need to be accounted for [17].One of these characteristics is autocorrelation, which allows for the fact that values of near neighbouring datapoints may be more similar (or different) than distant datapoints [7,18,19].If autocorrelation is unaccounted for [e.g., when using ordinary least squares (OLS), in the presence of (likely) positive autocorrelation] the regression parameter standard errors may be underestimated [17,20,21].Several estimation methods are available to account for autocorrelation [e.g., restricted maximum likelihood (REML), Prais-Winsten (PW)] [20,22,23].
Two-stage meta-analysis may be used to combine effects across ITS studies.In the first stage, segmented linear models are fitted to each ITS study to obtain interruption effect estimates and their standard errors [24,25].These estimates may be reported in the primary publications, or the systematic reviewer may re-analyse the time series data to obtain the required estimates [26].Then, in the second stage, the effect estimates are combined using a meta-analysis model; commonly either a fixed (common) effect or random-effects meta-analysis model [24].Fixed-effect meta-analysis weights studies by the inverse of the variance of their estimated effect, and hence analysis requires only the effect estimates and their standard errors.However, the random-effects method weights additionally involve the between-study variance, a parameter which must be estimated and for which many estimators are available [24,[27][28][29].Furthermore, there exist many confidence interval methods for the summary (combined) meta-analytic effect [30].
We previously undertook a numerical simulation study examining the performance of different meta-analysis methods to combine results from ITS studies with continuous outcomes, and how characteristics of the metaanalysis, ITS design, and method of analysis of the individual ITS studies modified the performance [31].We examined ITS analysis and meta-analysis methods that are commonly used, or have been shown through numerical simulation to be preferable [20,29,30].We found that all random-effects methods yielded confidence interval coverage for the summary effect close to the nominal level, irrespective of the ITS analysis method used.However, the between-study variance was overestimated in some scenarios [31].In this companion study, we aimed to demonstrate empirically how the same methods compare when applied to real-world data, and answer the question: does statistical method choice importantly impact the meta-analysis results?Together, the simulation and empirical studies allow for a more complete understanding of which methods should be used in different scenarios.Specifically, our objectives were to: i) compare the meta-analysis estimates of the immediate level-change and slope-change, their standard errors, confidence intervals and p-values, and the estimates of between-study variance obtained from different

Overview of the methods
An overview of the steps and corresponding Sections is depicted in Fig. 2. In brief, we sourced ITS data from published meta-analyses (sections Identification of reviews and meta-analyses and Methods to obtain time series data) and re-analysed them using two ITS analysis estimation methods (section Interrupted time series (ITS) analysis methods).The level-change and slopechange effect estimates (and their associated standard errors) were meta-analysed using a fixed-effect and four random-effects meta-analysis methods (section Metaanalysis methods).We compared the meta-analysis effect estimates, their standard errors, confidence intervals and p-values, and estimates of the between-study variance, across the meta-analysis methods (sections Analysis and meta-analysis of the ITS datasets and Comparison Fig. 2 Depiction of the analysis methods used in this empirical study.*The estimation methods for ITS analysis are listed in order of preference, i.e.REML is used whenever it converges and the estimated autocorrelation is between -1 and 1, while PW followed by OLS are used in the case of non-convergence.ITS interrupted time series, REML restricted maximum likelihood, PW Prais-Winsten, OLS ordinary least squares, DL DerSimonian and Laird, WT Wald-type, HKSJ Hartung-Knapp/Sidik-Jonkman of results from different ITS-analysis and meta-analysis methods).

Identification of reviews and meta-analyses
We sourced data for the present study from our previous methodological review that examined the statistical approaches used in reviews that include meta-analysis of ITS studies [26].In brief, we searched eight electronic databases and included reviews containing at least one meta-analysis that included at least two ITS studies (using the review authors' definition of an ITS).From each review, meta-analysis methods were examined for a single comparison-outcome (see the methodological review protocol for selection details [32]).In addition, reviews were eligible for the present study if: 1) The review's meta-analysis included at least two ITS studies that had at least three datapoints before and after an interruption and a clearly defined interruption timepoint; and 2) The raw time series data were available.Data was classified as unavailable, if for example, the review authors had directly extracted effect estimates from the primary studies, or if it was not clear if the review authors had directly extracted effect estimates from the primary studies or re-analysed the raw time series data.

Methods to obtain time series data
We sought the raw time series data using the following hierarchy of approaches: 1. Sourced the time series data from the review (e.g., where the data were available in supplementary files).2. Contacted (via email) the corresponding author of the review, and requested the time points (and time unit, e.g., week, month), aggregate summary statistic (e.g., mean, rate, proportion), and time point(s) at which the interruption(s) occurred for each ITS. 3. Digitally extracted time series data from published figures in the review using WebPlotDigitizer [33].This data extraction tool has been shown to yield data that can be used to obtain accurate estimates of the effect estimates and standard errors from published ITS graphs [34].
We only sought time series data from authors of the reviews, and not authors of the primary studies, for reasons of feasibility.

Statistical model for an ITS analysis
We fitted the following segmented linear regression model to each of the included ITS studies [5]: The continuous outcome at time t(t = 1, . . ., T ) is rep- resented by Y t .The series are divided into two segments, before and after the interruption.The time of the interruption (I) occurs at time T I .The segments are identi- fied by D t ( D t = 1 (t≥T I ) in the post-interruption period) (Additional file 1: Figure S1) .β 0 represents the intercept in the pre-interruption period, β 1 the pre-interruption slope, and β 2 and β 3 represent the interruption effects- respectively, immediate level-change and slope-change.The error term accommodates lag-1 (AR(1)) autocorrelation ( ρ ) via ε t = ρε t−1 + w t , ( w t ∼ N (0, 1) ); where ρε t−1 allows for correlation between the current and the previous time point.Longer lags (i.e., higher order autocorrelation) can be modelled; however, we did not consider these here since we did not investigate longer lags in our companion numerical simulation study [31].

Estimation methods for ITS analysis
We used three statistical estimation methods for the analysis of the included ITS studies.These methods were selected because they are commonly used in practice [35], or have been shown to have improved statistical performance (via numerical simulation) [20,22].Briefly, the methods were: • Ordinary least squares (OLS) [17], which assumes that the model errors are uncorrelated between observations.In the presence of positive autocorrelation, which has been shown to frequently occur in time series data [36], this assumption is violated, leading to potential underestimation of the variances of the regression parameters [15,37]; • Prais-Winsten (PW), which is a generalised leastsquares extension of OLS.PW estimation involves fitting the model using OLS and estimating lag-1 autocorrelation from the residuals, then, transforming the data using the estimated autocorrelation and re-estimating the regression parameters [23].The aim is to remove the autocorrelation from the errors, which may require multiple iterations for the estimated autocorrelation to converge [23].Accounting for autocorrelation in this way has been shown to improve estimation of the regression parameter standard errors compared with OLS estimation in the presence of autocorrelation; however, the stand- ard errors are still underestimated using PW, particularly when there are few datapoints [20].
• Restricted Maximum Likelihood (REML), which is a form of maximum likelihood (ML) estimation, attempts to avoid the underestimation of the variance (and covariance) parameter estimates that can arise with ML estimation.REML involves separate estimation of the (co)variance parameters to account for the loss in degrees of freedom due to estimation of the regression parameters [22].In the context of ITS studies, while both ML and REML directly estimate and adjust standard errors for autocorrelation, ML has been shown to yield less biased standard errors of the regression parameters compared with REML when autocorrelation was small, but positively biased standard errors when autocorrelation was large [20,22].

Meta-analysis methods
We used meta-analysis to combine the interruption effect estimates calculated using the methods in section Interrupted time series (ITS) analysis methods for each ITS study.We examined five meta-analysis methods, selected because they are frequently used in practice, or are known to have more favourable statistical properties.

Statistical models for meta-analysis
We examined a fixed-effect (common effect) and four random-effects models.The fixed-effect model is specified by: where it is assumed that each of the K included ITS stud- ies provide an estimate ( β mk ) of a single true interruption effect common to all studies, β m (where m indicates the regression parameter of interest from Eq. 1, such as β 2 for immediate level-change), and any within-study error in the estimation is due to sampling variability alone, ε mk ∼ N (0, σ 2 mk ).The random-effects meta-analysis model is specified by: where it is assumed that each of the K ITS studies pro- vide an estimate ( β mk ) of a true interruption effect spe- cific to the k th study (i.e., β * m + δ mk ), where β * m represents the mean of the distribution of true interruption effects (for the m th regression parameter) and δ mk represents a random effect in the k th ITS study, which are assumed to be normally distributed about the mean with a (2) between-study variance τ 2 m .The within-study error in estimating the k th ITS study's interruption effect from a sample of participants is represented by ε * mk ∼ N (0, σ 2 mk ).

Estimation methods for meta-analysis
The meta-analytic effect of the m th regression parame- ter is calculated as a weighted average of the K ITS study effect estimates, (with a variance of 1

W mk
).The weight given to the k th ITS study is the reciprocal of the within-study variance, W mkFE = 1 σ 2 mk when using a fixed-effect model, or when using a random-effects model.Different betweenstudy variance ( τ 2 m ) estimators are available [29], as well as methods to calculate the confidence interval for the meta-analytic effect [30].We used two between-study variance estimators and two confidence interval methods.
We examined the following between-study variance estimators: • DerSimonian and Laird (DL) [38], which is a momentbased between-study variance estimator derived from Cochran's Q-statistic, was selected for evaluation in this study because it is commonly used in practice [26,29].However, DL is well known to yield biased estimates of the between-study variance in particular scenarios (i.e., small underlying between-study variance and few studies; or, many studies and large underlying heterogeneity) [31,39,40]; • Restricted Maximum Likelihood (REML), which is an iterative between-study variance estimator that attempts to correct for the negative bias associated with the ML estimator [29].REML has been recommended as an alternative estimator because of its slightly improved performance compared with DL, and for this reason was selected for evaluation in this study [29,40,41].
We examined two confidence interval methods for the meta-analytic effect, which can be used with both the DL and REML between-study variance estimators: • The Wald-type normal distribution (WT) confidence interval method [42], which uses the standard normal distribution to calculate the confidence limits.This method maintains the assumption of normality of β * m despite the within-study and between-study variances not being known and instead estimated [28,30].The WT method relies on large-sample approximations, which are not generally met in the context of meta-analysis due to few included studies [43,44].This can lead to lower than nominal levels of 95% confidence interval coverage, particularly when there are few included studies or the between-study variance is large [30].
• The Hartung-Knapp [45]/Sidik-Jonkman [46] (HKSJ) confidence interval method, which attempts to overcome the assumption that the within-study variance is known and the between-study variance is accurately estimated, in scenarios where these conditions are unlikely to be met (e.g., meta-analyses with few studies of small sample sizes).The method involves making a small sample adjustment to the meta-analysis standard error and uses the t-distribution (with K-1 degrees of freedom) in the calculation of the confidence limits.This adjustment yields wider confidence intervals than the WT method, except when there are few studies and the estimated betweenstudy variance is zero [29].

Analysis and meta-analysis of the ITS datasets
Prior to fitting the models, we excluded ITS from the meta-analyses where the study i) did not meet our minimum required number of datapoints, or ii) had a large proportion of time series datapoints that were zero (i.e., greater than 40%), such that it was not reasonable to assume that the error term would be normally distributed.In addition, we removed any control series that were included in the original meta-analysis, because our interest was in the interrupted series only.Furthermore, we excluded segments of studies that had multiple interruptions.Specifically, we only included the first interruption (and the adjacent segments).Additional file 1: Table S1 includes all modifications, with justifications.Modifications were discussed and agreed upon at team meetings (including authors EK, SLT, ABF, AK and JEM).We fitted a segmented linear regression model (section Statistical model for an ITS analysis, Eq. 1) to each ITS study and estimated the regression parameters (immediate level-change ( β 2 ) and slope-change ( β 3 )) using both OLS and REML (section Estimation methods for ITS analysis) (Fig. 2).If REML failed to converge or to yield an estimate of autocorrelation between -1 and 1, we used PW, and where PW failed, we used OLS.Given the outcomes varied across the meta-analyses, we standardised the ITS study effect estimates (immediate level-change, slope-change) prior to meta-analysis, so that the resulting meta-analysis effect estimates were standardised and comparable across meta-analyses.The ITS effect estimates obtained via REML, PW and OLS were standardised by dividing them (and their standard errors) by the root mean square error estimated from the OLS analysis.Slope-change effect estimates were then standardised, if required, to reflect the standardised slope-change per month by multiplying or dividing by an appropriate factor (e.g., slope-change calculated from a series with yearly timepoints was divided by 12 to reflect the slope-change per month).
The standardised ITS study level-change and slopechange estimates were then meta-analysed (separately) using five meta-analysis methods (section Estimation methods for meta-analysis; Fig. 2).We standardised the direction of these meta-analysis effects so that for all a positive estimate reflected a beneficial impact of the interruption.This was achieved by multiplying the metaanalysis estimates where a negative estimate was beneficial (e.g., a decrease in fatality rates) by -1, to reverse the direction of interpretation.
We undertook sensitivity analyses to investigate whether the results were robust to our choice of threshold for excluding ITS based on the proportion of datapoints that were zero.For the sensitivity analysis, we excluded ITS from the meta-analyses where the study had greater than 30% but less 40% of time series datapoints that were zero.We then repeated the above analyses and informally compared the results.

Comparison of results from different ITS-analysis and meta-analysis methods
We compared meta-analysis effect estimates (i.e., immediate level-change and slope-change), and their standard errors between each of the combinations of ITS analysis methods and meta-analysis methods.For each pairwise comparison between the combinations, we calculated (and tabulated) the average of the differences between the estimates (i.e., the mean difference = the sum of the differences between the estimates yielded by the two methods being compared, divided by the number of meta-analyses, 17) and the limits of agreement (calculated as the mean difference ± 1.96 × standard deviation of the differences) [52].The limits of agreement provide a range within which most of the differences between estimates will lie [52].For the standard errors, we first logtransformed these to remove the relationship between the variability of the differences and the magnitude of the standard errors [52].We used Bland-Altman scatter plots to visualise the agreement, whereby, for each pairwise comparison between combinations, we plotted the difference between the estimates vs their average [52].
We compared confidence interval widths between each of the combinations of ITS analysis and meta-analysis methods.For each pairwise comparison, we plotted the ratio of the confidence intervals, scaled such that the reference confidence interval width spanned -0.5 to 0.5 (following the approach of Turner et al. [36]).
We compared the estimates of between-study variance ( τ 2 ) between each combination of ITS analysis meth- ods and between-study variance estimators.For each meta-analysis and pairwise comparison, we calculated (and tabulated) the median and interquartile range (IQR) of the differences between the estimates of the betweenstudy variance.
We compared the p-values of the meta-analytic levelchange and slope-change estimates between each of the combinations of ITS analysis and meta-analysis methods.We categorised the p-values using the conventionally used (though not recommended) statistical significance Fig. 3 Flow diagram of included reviews, their meta-analysis and interrupted time series (ITS) studies.*54 reviews were identified in a methodological systematic review (see Korevaar et al. [26] for the search strategy used).**Two authors that were contacted did not provide data, as such, we digitally extracted the raw time series data from the figures provided in the review manuscripts threshold of 0.05.The percentage of meta-analyses where there was agreement in the categories of statistical significance was calculated.Namely, the percentage of meta-analyses where the p-value for the effect estimate from both methods was < 0.05 or ≥ 0.05.Agreement between the statistical methods in the conclusion about the statistical significance was further quantified using the kappa statistic, where we used the following adjectives to describe agreement: moderate agreement as a kappa value of 0.41-0.6,substantial agreement as a value of 0.61-0.8,and almost perfect agreement as a value of 0.81-1.0[53].

Results
Of the 54 reviews included in the source methodological review [26], 40 met the additional eligibility criteria for the present study (Fig. 3).We extracted data from the supplementary material of two reviews, and emailed the remaining 38 review authors.Of these, 35 emails were successfully delivered, from which 13 authors provided data.For a further two reviews, it was possible to digitally extract data from the ITS graphs included in the reviews.This resulted in the inclusion of 17 meta-analyses with 390 ITS.We further excluded 108 ITS from these metaanalyses for a variety of reasons (Fig. 3), leaving 282 ITS (from 17 meta-analyses) for our primary analyses.

Table 1 Characteristics of included meta-analyses and ITS studies
ITS interrupted time series, IQR interquartile range, PW Prais-Winsten, REML restricted maximum likelihood a Multiple response options possible therefore percentages sum to greater than 100% b Interruptions classified as 'individual-level interruption' were the intervention directed at an individual (e.g., delivery of a vaccine), however, the measurements were still aggregated over units of time (e.g., number of vaccinations each year) c Combination indicates where reviewers combined multiple data types (e.g., combining studies using proportion and rate outcomes) d The average number of datapoints and autocorrelation were calculated across the series included in each meta-analysis.These averages were then summarised across the meta-analyses using the median and IQR e Autocorrelation estimated using REML (or PW where REML failed to converge)

Characteristics of the included meta-analyses and ITS studies
The

Convergence of ITS analyses and meta-analyses using REML
Of the 282 ITS that were analysed using REML, 255 (90%) converged.For those that did not converge, PW was used, of which 4/27 (19%) failed to converge.OLS was used for the four that did not converge.All metaanalyses using REML converged.

Estimates of level-and slope-change meta-analytic effect estimates
When fixed-effect meta-analysis was fitted, on average, REML ITS analysis yielded slightly larger estimated immediate level-changes compared with OLS (depicted by the horizontal solid orange line, representing the average of the differences, being greater than zero in Fig. 4, solid red box; Table 2), but with wide limits of agreement (depicted by the horizontal dashed orange lines being wide), largely due to the influence of one outlying estimated level-change using REML.The different between-study variance estimators (i.e., using DL or REML) had no impact on the immediate level-change within ITS analysis method (i.e., OLS ITS analysis with the DL between-study variance estimator vs OLS ITS with the REML estimator; REML ITS analysis with the DL between-study variance estimator vs REML ITS with the REML estimator), as depicted by the horizontal solid orange line sitting on zero, and the limits of agreement being close to zero in Fig. 4 (solid blue boxes).Furthermore, the estimated meta-analytic immediate level-changes were, on average, similar across the combinations of between-study variance estimators and ITS analysis methods (Fig. 4 solid black boxes); however, the limits of agreement (which were approximately ±0.33 ) showed that methods could yield small to moderate differences in estimates of level-change for a given metaanalysis.The patterns were similar for the effect estimates of the meta-analytic slope-change per month (see Fig. 4, dashed boxes).

Standard errors of the level-and slope-change meta-analytic effects
The standard errors of the meta-analytic level-change were most influenced by the meta-analysis model, with the standard errors being substantially larger when a random-effects model was fitted (as depicted by the horizontal solid orange line being greater or less than zero, depending on the order of the comparisons, in Fig. 5, yellow boxes, and Table 3).When random-effects metaanalysis methods were fitted, on average, there were no important differences in the standard errors of the metaanalytic level-change (depicted by the horizontal solid orange line sitting on zero in Fig. 5), across ITS analysis methods (black boxes), between-study variance estimators (blue boxes) or where there was a small sample adjustment made to the meta-analysis standard error (as occurs with the HKSJ method)[red boxes].However, the limits of agreement were wide across ITS analysis methods (black boxes) and where there was a small sample adjustment (red boxes); for example, the limits of agreement for the comparison of REML ITS vs OLS ITS analysis (both with the REML between-study variance estimator and HKSJ confidence interval method) suggest that the meta-analysis estimate of standard error is likely to be between 37% smaller to 63% larger when using REML ITS compared with OLS ITS analysis (Table 3).The patterns were similar for the standard errors of the meta-analytic slope-change per month (dashed boxes).

Confidence intervals of level-and slope-change meta-analytic effects
The confidence interval widths of the random-effects meta-analytic level-change were similar irrespective of the ITS analysis method, or between-study variance estimator (as depicted by the confidence intervals being the width of the reference rectangle in Fig. 6 black and blue boxes, see Additional file 1: Figure S3 for random-effect meta-analysis comparisons only).However, the confidence interval widths were mostly similar or wider when the HKSJ method was used as compared with the WT confidence interval method (as depicted by confidence intervals being the width of the reference rectangle, or wider, in Fig. 6 red boxes).The confidence intervals of the random-effects meta-analytic slope-change per month were more variable than the level-change confidence interval widths; however, the patterns were the same (dashed boxes).
Fig. 4 Bland Altman plot of difference in standardised meta-analytic effect estimates (y-axis) vs average of the effect estimates (x-axis), for each pairwise comparisons of ITS analysis and meta-analysis method combination (top row of the label indicates the ITS analysis methods, bottom row indicates the meta-analysis method, e.g., OLS ITS DL MA is OLS ITS analysis with DerSimonian and Laird between-study variance meta-analysis).
The top triangle (green points) presents the immediate level-change (difference calculated as column method -row method), and the bottom triangle (blue points) presents the slope-change per month (difference calculated as row method -column method).Horizontal orange lines depict the average, dashed orange lines depict the 95% limits of agreement (calculated as the mean ± 1.96*standard deviation of the differences).Vertical grey line indicates an average of zero, while the horizontal grey line indicates a mean difference of zero.The coloured boxes indicate cells that compare ITS analysis methods when fixed-effect meta-analysis was fitted (red boxes), meta-analysis models (i.

p-values
Pairwise comparisons of the meta-analytic level-change statistical significance between REML ITS analysis and OLS ITS analysis (keeping meta-analysis method constant) ranged from substantial to almost perfect agreement, irrespective of the meta-analysis methods used (Table 4 and Additional file 1: Table S2).Similarly, the level of statistical significance agreement between comparisons of between-study variance estimators, and comparisons of confidence interval methods, ranged from substantial to almost perfect agreement.However, the agreement was systematically (slightly) lower when REML ITS analysis was used compared to OLS.In addition, the statistical significance agreement was lower when different confidence interval methods were used; this reduction in agreement was more pronounced when REML ITS analysis was used compared with OLS.The patterns were similar for the statistical significance agreement for the meta-analytic slopechange per month, which ranged from moderate to almost perfect agreement between most pairwise comparisons, irrespective of the statistical methods used.

Estimates of between-study variance
We compared the between-study variance estimates yielded by different combinations of ITS analysis methods (OLS and REML) and the between-study variance estimators (DL and REML).The median and IQR of the pairwise differences in between-study variance estimates indicated no substantial differences (Fig. 7 and Table 5).

Sensitivity analysis
In our sensitivity analysis, we excluded 16 ITS from 5 meta-analyses.The results of the sensitivity analysis did not differ substantively from the primary analyses.Details of the differences between the meta-analyses in the primary analysis and the sensitivity analysis are presented in Additional file 1: Table S3; summary results are provided in Additional file 1: Appendix 3.

Repository of ITS data
The ITS datasets analysed in this study, for which the authors gave consent (for 16 of 17 meta-analyses) are provided in an online repository: https:// doi.org/ 10. 26180/ 21280 791 [51].For each dataset, we describe the intervention and outcome examined, any changes made to the original meta-analysis to suit our purposes, and indicate for each ITS, the time, interval of time, time of interruption, segment in segmented linear regression model, the observation and its outcome type, and whether the ITS study was excluded from our sensitivity analysis.

Summary and discussion of key findings
To our knowledge, no previous studies have empirically examined implications of different statistical methods for ITS analysis and meta-analysis using real-world ITS data.We created a repository of 17 meta-analyses including 282 ITS studies.We reanalysed each ITS study using two ITS analysis methods, and then meta-analysed the level-change and slope-change effects using five metaanalysis methods.We compared the impact of using Table 2 The mean difference of effect estimates and 95% limits of agreement for the meta-analytic immediate level-change (top triangle, difference calculated as column method -row method) and slope-change per month (bottom triangle, difference calculated as row method -column method) (n = 17) The top row of the label indicates the ITS analysis methods, bottom row indicates the meta-analysis method, e.g., OLS Fixed is OLS ITS analysis and fixed-effect metaanalysis.For example, the mean meta-analytic level-change yielded by REML ITS analysis with fixed-effect meta-analysis was 0.15 higher than that yielded by OLS ITS analysis with fixed-effect meta-analysis (column 4, row 1).The mean meta-analytic slope-change per month yielded by REML ITS analysis with fixed-effect metaanalysis was 0.02 higher than that yielded by OLS ITS analysis with fixed-effect meta-analysis (column 1, row 4) different statistical methods on the meta-analytic leveland slope-change effect estimates, standard errors, confidence intervals and p-values.The results of our empirical study provide insight into the behaviour of ITS analysis and meta-analysis methods when applied to real-world ITS data.
When fixed-effect meta-analysis was used, our results indicated that there may be differences in the estimated meta-analytic effect for a given meta-analysis.However, the immediate level-change effect estimates yielded by REML ITS analysis were only slightly larger, on average, compared with OLS, which was likely driven by a single Fig. 5 Bland Altman plot of log ratio of standard errors of the standardised meta-analytic effect estimates (y-axis) vs average of the standard errors (x-axis), for each pairwise comparisons of ITS analysis and meta-analysis method combination (top row of the label indicates the ITS analysis methods, bottom row indicates the meta-analysis method, e.g., OLS ITS DL MA is OLS ITS analysis with DerSimonian and Laird between-study variance meta-analysis).The top triangle (green points) presents the immediate level-change [log ratio calculated as log(column method / row method)], and the bottom triangle (blue points) presents the slope-change per month [log ratio calculated as log(row method / column method)].Horizontal orange lines depict the average, dashed orange lines depict the 95% limits of agreement (calculated as the mean ± 1.96*standard deviation of the log(ratio)).Vertical grey line indicates an average of zero, while the horizontal grey line indicates a log(ratio) of zero.The coloured boxes indicate cells that compare meta-analysis models (i.e., fixed-vs random-effects models)[yellow boxes], ITS analysis methods when random-effects meta-analysis was used (black boxes), between-study variance estimators (blue boxes) and confidence interval methods  3 The mean ratio of standard errors and 95% limits of agreement for the meta-analytic immediate level-change (top triangle, ratio calculated as column method / row method) and slope-change per month (bottom triangle, ratio calculated as row method / column method) (n = 17) For example, on average the standard error of the meta-analytic level-change yielded by REML ITS analysis with fixed effect meta-analysis was 6% higher than the standard error yielded by OLS ITS analysis with fixed effect meta-analysis (column 6, row 1), while the standard error of the meta-analytic slope-change per month yielded by REML ITS analysis with fixed effect meta-analysis was, on average, 20% higher than the standard error yielded by OLS ITS analysis with fixed effect meta-analysis (column meta-analysis result.In addition, while on average we found unimportant differences in the estimated standard errors of the meta-analytic effects between the ITS analysis methods, for a given meta-analysis, there could be important differences.Estimated standard errors of the fixed-effect meta-analytic effects between the ITS methods have been shown (via numerical simulation [31]) to importantly differ for short series or where the Fig. 6 Pairwise comparison of confidence intervals yielded by combinations of ITS analysis (OLS or REML) and meta-analysis methods (fixed, DL + WT, DL + HKSJ, REML + WT or REML + HKSJ).Each plot contains the 17 meta-analyses' absolute difference in meta-analytic effect estimates and relative confidence intervals, ranked in order of scaled relative confidence interval width.The top triangle (green points) presents the immediate level-change, while the bottom triangle (blue points) presents the slope-change per month.The scaled relative confidence interval widths for the level-change were calculated as column method confidence interval width divided by row method confidence interval width (and row method / column method for slope-change per month), scaled such that the row method (column method in the case of slope-change per month) spans -0.5 to 0.5 (indicated by the horizontal grey lines, which form the 'reference rectangle').Confidence intervals entirely within the reference rectangle (i.e., between the horizontal grey lines) have smaller confidence intervals than the comparison (left of the vertical red line), while the confidence intervals extending beyond the reference rectangle have larger confidence intervals than the comparison (right of the vertical red line).The black confidence intervals indicate where one or both of the confidence limits were beyond the limits y-axis scale.The coloured boxes indicate cells that compare meta-analysis models (i.e., fixed-vs random-effects models)[yellow boxes], ITS analysis methods when random-effects meta-analysis was used (black boxes), between-study variance estimators (blue boxes) and confidence interval methods The top row of the label indicates the ITS analysis methods, bottom row indicates the meta-analysis method, e.g., OLS Fixed is OLS ITS analysis and fixed effect meta-analysis.We used the following adjectives to describe agreement: moderate agreement as a kappa value of 0.41-0.6,substantial agreement as a value of 0.61-0.underlying autocorrelation tends to be larger (i.e., at least 0.4).In the present dataset, some of the series were short and had autocorrelation greater than 0.4 potentially explaining the differences.When random-effects meta-analysis was used, we found that on average the estimates of the random-effects meta-analytic effects of level-and slope-changes and their standard errors, were not impacted by the choice of random-effects meta-analysis method, irrespective of the ITS analysis method used.As expected, however, the standard errors were substantially larger compared with a fixed-effect model, due to the between-ITS variance (which was commonly estimated as greater than zero) being accounted for in the random-effects model.Furthermore, we found that the between-study variance estimates did not systematically differ by ITS analysis method or between-study variance estimator; which has been observed in other studies [29,31].However, the Fig. 7 Pairwise comparisons of the between-study variance estimates ( τ 2 ) yielded by combinations of ITS analysis methods and between-study variance estimators.The between-study variance estimate yielded by the row method (y-axis) versus the between-study variance estimate yielded by the column method (x-axis), for the level-change (top triangle) and slope-change per month (bottom triangle) meta-analyses.DL DerSimonian and Laird, HKSJ Hartung-Knapp/Sidik-Jonkman, ITS interrupted time series, OLS ordinary least squares, REML restricted maximum likelihood confidence interval method was shown to impact the confidence interval widths and statistical significance of the meta-analytic level-changes.This was primarily driven by the use of the t-distribution in the calculation of the confidence interval limits when using the HKSJ confidence interval method, rather than the small sample adjustment to the meta-analytic standard error.The consequence of wider confidence intervals and more conservative p-values when using HKSJ compared to WT, is that the conclusions drawn from the meta-analysis may differ.

Strengths and limitations
Our study has several strengths.We examined ten statistical analysis combinations, which we compared using the metrics typically important to researchers undertaking meta-analysis, i.e., the meta-analytic point estimates, between-study variance estimates, confidence intervals, and p-values.Furthermore, the included systematic reviews and meta-analyses varied by the types of interruptions examined, the outcomes, the number of included studies per meta-analysis, and the number of datapoints per ITS study.The repository of ITS datasets has been made publicly available in an online repository, facilitating future methodological and statistical research.
Our study has several limitations.We were able to obtain raw ITS data from 17 of the 40 reviews included in our methodological review.While a small number of datasets is common in empirical methodological research [46,[54][55][56][57], this hinders examination of factors that may modify how the methods compare (e.g., the number of studies per meta-analysis).Furthermore, with a small number of datasets, outliers have more influence and parameters (such as the limits of agreement) are estimated with more uncertainty.In addition, we made several assumptions when analysing the ITS studies which may not hold (e.g., assuming count outcomes were continuous); we did not adjust for potential confounders (that may have been adjusted for in the original analysis); and, we fitted a segmented linear regression model with lag-1 autocorrelation (which may have differed to the original analysis and may not have provided the best fit).However, for reasons of feasibility and our interest in comparing the statistical methods and not in addressing the research question examined in the original metaanalysis, meant that we did not assess the fit or modify the models for the 282 included ITS studies.

Implications for practice
We have demonstrated that the statistical methods for ITS analysis and meta-analysis do not, on average, impact the meta-analytic level-and slope-change effect estimates, their standard errors or the between-study variance estimates.However, across ITS analysis methods, for any given meta-analysis, there could be small to moderate differences in meta-analytic effect estimates, and important differences in the meta-analytic standard errors.Furthermore, the confidence intervals and p-values may be impacted.This demonstrates that in practice the statistical methods choices we have investigated may materially impact the results and conclusions, and the methods should therefore not be considered interchangeable.In this circumstance, numerical simulation studies provide the best evidence as to which methods are optimal under different scenarios (e.g., meta-analyses including short series), and we refer readers to our companion numerical simulation study for recommendations [31].Furthermore, given the choice of methods can impact the results, it is even more important that the specific ITS analysis and meta-analysis methods used are reported.A systematic review examining the statistical methods used in meta-analysis of ITS studies found that Table 5 The median and IQR for the differences in between-study variance estimates for the meta-analytic immediate level-change (top triangle, difference calculated as column method -row method) and slope-change per month (bottom triangle, difference calculated as row method -column method) (n = 17) The top row of the label indicates the ITS analysis methods, bottom row indicates the meta-analysis method, e.g., OLS Fixed is OLS ITS analysis and fixed effect metaanalysis.For example, the median between-study variance for immediate level-change yielded by REML ITS analysis with REML estimator 0.08 higher than that yielded by OLS ITS analysis with DL estimator (column 4, row 1) DL DerSimonian and Laird, HKSJ Hartung-Knapp/Sidik-Jonkman, ITS interrupted time series, IQR interquartile range, MA meta-analysis, OLS ordinary least squares, REML restricted maximum likelihood, WT Wald-type

Implications for future research
Our ITS data repository may be expanded, facilitating other methodological and statistical research.Our research could be extended to examine the impact of ITS methods for analysing other outcome types, particularly count outcomes, due to their frequent use in ITS studies.Furthermore, our examinations could be expanded to accommodate increasing autocorrelation lags and seasonal patterns.In addition, we have not examined the impacts of the statistical methods on meta-analytic effect prediction intervals, which provide a predicted range for the true interruption effect in an individual study, and are a critical tool for decision-making [58].Understanding the implications of statistical method choice on the prediction intervals is an important next step given the known impact of the ITS analysis methods on the estimation of between-study variance [31].

Conclusions
We found on average minimal impact of statistical method choice on the meta-analysis effect estimates, their standard errors or the between-study variance estimates.However, across ITS analysis methods, for any given meta-analysis, there could be small to moderate differences in meta-analytic effect estimates, and important differences in the meta-analytic standard errors.Furthermore, we found that confidence intervals and p-values could vary according to the choice of statistical method.These differences may materially impact the results and conclusions and suggest that the statistical methods are not interchangeable in practice.

Fig. 1 A
Fig. 1 A Six interrupted time series (ITS) studies examining the effect of supermarket policies on purchases of common checkout foods [11].The crosses represent data points, the solid lines represent the pre-and post-interruption trend lines and the dashed line represents the counterfactual trend line.The vertical dashed green line indicates the time of the interruption.B Forest plots depicting study-level and meta-analysis estimates of immediate level-change (left) and slope-change (right).ITS interrupted time series Fig.4 Bland Altman plot of difference in standardised meta-analytic effect estimates (y-axis) vs average of the effect estimates (x-axis), for each pairwise comparisons of ITS analysis and meta-analysis method combination (top row of the label indicates the ITS analysis methods, bottom row indicates the meta-analysis method, e.g., OLS ITS DL MA is OLS ITS analysis with DerSimonian and Laird between-study variance meta-analysis).The top triangle (green points) presents the immediate level-change (difference calculated as column method -row method), and the bottom triangle (blue points) presents the slope-change per month (difference calculated as row method -column method).Horizontal orange lines depict the average, dashed orange lines depict the 95% limits of agreement (calculated as the mean ± 1.96*standard deviation of the differences).Vertical grey line indicates an average of zero, while the horizontal grey line indicates a mean difference of zero.The coloured boxes indicate cells that compare ITS analysis methods when fixed-effect meta-analysis was fitted (red boxes), meta-analysis models (i.e., fixed-vs random-effects models)[yellow boxes], between-study variance estimators (i.e., using DL or REML)[blue boxes], and combinations of between-study variance estimators and ITS analysis methods (black boxes).The solid coloured boxes indicate comparisons of level-change and dashed boxes indicate slope-change per month.DL DerSimonian and Laird, HKSJ Hartung-Knapp/Sidik-Jonkman, ITS interrupted time series, MA meta-analysis, OLS ordinary least squares, REML restricted maximum likelihood, WT Wald-type Fig.5 Bland Altman plot of log ratio of standard errors of the standardised meta-analytic effect estimates (y-axis) vs average of the standard errors (x-axis), for each pairwise comparisons of ITS analysis and meta-analysis method combination (top row of the label indicates the ITS analysis methods, bottom row indicates the meta-analysis method, e.g., OLS ITS DL MA is OLS ITS analysis with DerSimonian and Laird between-study variance meta-analysis).The top triangle (green points) presents the immediate level-change [log ratio calculated as log(column method / row method)], and the bottom triangle (blue points) presents the slope-change per month [log ratio calculated as log(row method / column method)].Horizontal orange lines depict the average, dashed orange lines depict the 95% limits of agreement (calculated as the mean ± 1.96*standard deviation of the log(ratio)).Vertical grey line indicates an average of zero, while the horizontal grey line indicates a log(ratio) of zero.The coloured boxes indicate cells that compare meta-analysis models (i.e., fixed-vs random-effects models)[yellow boxes], ITS analysis methods when random-effects meta-analysis was used (black boxes), between-study variance estimators (blue boxes) and confidence interval methods (red boxes).The solid coloured boxes indicate comparisons of level-change and dashed boxes indicate slope-change per month.DL DerSimonian and Laird, HKSJ Hartung-Knapp/Sidik-Jonkman, ITS interrupted time series, OLS ordinary least squares, REML restricted maximum likelihood, WT Wald-type Fig.6 Pairwise comparison of confidence intervals yielded by combinations of ITS analysis (OLS or REML) and meta-analysis methods (fixed, DL + WT, DL + HKSJ, REML + WT or REML + HKSJ).Each plot contains the 17 meta-analyses' absolute difference in meta-analytic effect estimates and relative confidence intervals, ranked in order of scaled relative confidence interval width.The top triangle (green points) presents the immediate level-change, while the bottom triangle (blue points) presents the slope-change per month.The scaled relative confidence interval widths for the level-change were calculated as column method confidence interval width divided by row method confidence interval width (and row method / column method for slope-change per month), scaled such that the row method (column method in the case of slope-change per month) spans -0.5 to 0.5 (indicated by the horizontal grey lines, which form the 'reference rectangle').Confidence intervals entirely within the reference rectangle (i.e., between the horizontal grey lines) have smaller confidence intervals than the comparison (left of the vertical red line), while the confidence intervals extending beyond the reference rectangle have larger confidence intervals than the comparison (right of the vertical red line).The black confidence intervals indicate where one or both of the confidence limits were beyond the limits y-axis scale.The coloured boxes indicate cells that compare meta-analysis models (i.e., fixed-vs random-effects models)[yellow boxes], ITS analysis methods when random-effects meta-analysis was used (black boxes), between-study variance estimators (blue boxes) and confidence interval methods (red boxes).The solid coloured boxes indicate comparisons of level-change and dashed boxes indicate slope-change per month.See Additional file 1: FigureS3for random-effect meta-analysis comparisons only.DL DerSimonian and Laird, HKSJ Hartung-Knapp/Sidik-Jonkman, ITS interrupted time series, OLS ordinary least squares, REML restricted maximum likelihood, WT Wald-type
29%).The autocorrelation of the ITS studies estimated by REML ITS analysis had a median of 0.22 (IQR: 0.00, 0.48, n = 282), while the average estimate of autocorrelation at the meta-analysis level had a median of 0.17 (IQR: 0.13, 0.42).